SIM到现实的转移是机器人增强学习的强大范式。在模拟中训练政策的能力可以以低成本快速探索和大规模数据收集。但是,机器人策略的SIM到现实转移的先前工作通常不涉及任何人类机器人的相互作用,因为准确模拟人类行为是一个空旷的问题。在这项工作中,我们的目标是利用模拟的力量来训练熟练在部署时与人类互动的机器人政策。但是有一个鸡肉和鸡蛋问题 - 我们如何收集人与物理机器人互动的例子,以在模拟中对人类行为进行建模,而没有已经有能够与人相互作用的机器人?我们提出的方法,即迭代-SIM-to-real(I-S2R),试图解决这个问题。 I-S2R引导程序来自一个简单的人类行为模型和在模拟和在现实世界中部署的训练之间的交替。在每次迭代中,人类行为模型和政策都得到了完善。我们在现实世界的机器人乒乓球环境中评估我们的方法,该机器人的目标是尽可能长时间与人类玩家合作。乒乓球是一项高速,充满活力的任务,要求两名球员对彼此的举动迅速做出反应,从而使测试床具有挑战性,以研究人类机器人互动。我们在一个工业机器人手臂上介绍了结果,该机器人能够与人类球员合作打乒乓球,平均获得22次连续击球的集会,充其量只有150个。此外,对于80%的球员来说,与SIM-TO-REAL(S2R)基线相比,拉力长度长70%至175%。有关我们系统中的视频,请参见https://sites.google.com/view/is2r。
translated by 谷歌翻译
We present a machine-learning framework to accurately characterize morphologies of Active Galactic Nucleus (AGN) host galaxies within $z<1$. We first use PSFGAN to decouple host galaxy light from the central point source, then we invoke the Galaxy Morphology Network (GaMorNet) to estimate whether the host galaxy is disk-dominated, bulge-dominated, or indeterminate. Using optical images from five bands of the HSC Wide Survey, we build models independently in three redshift bins: low $(0<z<0.25)$, medium $(0.25<z<0.5)$, and high $(0.5<z<1.0)$. By first training on a large number of simulated galaxies, then fine-tuning using far fewer classified real galaxies, our framework predicts the actual morphology for $\sim$ $60\%-70\%$ host galaxies from test sets, with a classification precision of $\sim$ $80\%-95\%$, depending on redshift bin. Specifically, our models achieve disk precision of $96\%/82\%/79\%$ and bulge precision of $90\%/90\%/80\%$ (for the 3 redshift bins), at thresholds corresponding to indeterminate fractions of $30\%/43\%/42\%$. The classification precision of our models has a noticeable dependency on host galaxy radius and magnitude. No strong dependency is observed on contrast ratio. Comparing classifications of real AGNs, our models agree well with traditional 2D fitting with GALFIT. The PSFGAN+GaMorNet framework does not depend on the choice of fitting functions or galaxy-related input parameters, runs orders of magnitude faster than GALFIT, and is easily generalizable via transfer learning, making it an ideal tool for studying AGN host galaxy morphology in forthcoming large imaging survey.
translated by 谷歌翻译
Wearable sensors for measuring head kinematics can be noisy due to imperfect interfaces with the body. Mouthguards are used to measure head kinematics during impacts in traumatic brain injury (TBI) studies, but deviations from reference kinematics can still occur due to potential looseness. In this study, deep learning is used to compensate for the imperfect interface and improve measurement accuracy. A set of one-dimensional convolutional neural network (1D-CNN) models was developed to denoise mouthguard kinematics measurements along three spatial axes of linear acceleration and angular velocity. The denoised kinematics had significantly reduced errors compared to reference kinematics, and reduced errors in brain injury criteria and tissue strain and strain rate calculated via finite element modeling. The 1D-CNN models were also tested on an on-field dataset of college football impacts and a post-mortem human subject dataset, with similar denoising effects observed. The models can be used to improve detection of head impacts and TBI risk evaluation, and potentially extended to other sensors measuring kinematics.
translated by 谷歌翻译
Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Camera pose estimation is a key step in standard 3D reconstruction pipelines that operate on a dense set of images of a single object or scene. However, methods for pose estimation often fail when only a few images are available because they rely on the ability to robustly identify and match visual features between image pairs. While these methods can work robustly with dense camera views, capturing a large set of images can be time-consuming or impractical. We propose SparsePose for recovering accurate camera poses given a sparse set of wide-baseline images (fewer than 10). The method learns to regress initial camera poses and then iteratively refine them after training on a large-scale dataset of objects (Co3D: Common Objects in 3D). SparsePose significantly outperforms conventional and learning-based baselines in recovering accurate camera rotations and translations. We also demonstrate our pipeline for high-fidelity 3D reconstruction using only 5-9 images of an object.
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
大脑中的早期感觉系统迅速适应波动的输入统计,这需要神经元之间的反复通信。从机械上讲,这种复发的通信通常是间接的,并由局部中间神经元介导。在这项工作中,我们探讨了与直接复发连接相比,通过中间神经元进行反复通信的计算益处。为此,我们考虑了两个在统计上使其输入的数学上可行的复发性神经网络 - 一种具有直接复发连接,另一个带有介导经常性通信的中间神经元。通过分析相应的连续突触动力学并在数值上模拟网络,我们表明,具有中间神经元的网络比具有直接复发连接的网络更适合初始化,这是因为与Interneurons网络中的突触动态的收敛时间(RESS)(RESS)( 。直接复发连接)与它们初始化的频谱以对数(线性分析)进行对数缩放。我们的结果表明,中间神经元在计算上对于快速适应更改输入统计的有用。有趣的是,具有中间神经元的网络是通过直接复发连接网络的美白目标的过度参数化解决方案,因此我们的结果可以看作是在过度参数化的前馈线性线性网络中观察到的隐式加速现象的复发性神经网络模拟。
translated by 谷歌翻译
蚊子传播的疾病(MBD),例如登革热病毒,基孔肯雅病毒和西尼罗河病毒,每年在全球造成超过100万人死亡。由于许多这样的疾病都被伊蚊和库氏蚊子传播,因此跟踪这些幼虫对于缓解MBD的传播至关重要。即使公民科学成长并获得了较大的蚊子图像数据集,蚊子图像的手动注释变得越来越耗时且效率低下。先前的研究使用计算机视觉识别蚊子物种,卷积神经网络(CNN)已成为图像分类的事实。但是,这些模型通常需要大量的计算资源。这项研究介绍了视觉变压器(VIT)在比较研究中的应用,以改善伊蚊和库尔克斯幼虫的图像分类。在蚊子幼虫图像数据上对两个VIT模型,Vit-Base和CVT-13以及两个CNN模型进行了RESNET-18和CORVNEXT的培训,并比较确定最有效的模型,以将蚊子幼虫区分为AEDES或CULEX。测试表明,Convnext获得了所有分类指标的最大值,证明了其对蚊子幼虫分类的生存能力。基于这些结果,未来的研究包括通过结合CNN和Transformer架构元素来创建专门为蚊子幼虫分类设计的模型。
translated by 谷歌翻译
我们提出了一种使用图像增强的自我监督训练方法,用于学习视图的视觉描述符。与通常需要复杂数据集的现有作品(例如注册的RGBD序列)不同,我们在无序的一组RGB图像上训练。这允许从单个相机视图(例如,在带有安装式摄像机的现有机器人单元格中学习)学习。我们使用数据增强创建合成视图和密集的像素对应关系。尽管数据记录和设置要求更简单,但我们发现我们的描述符与现有方法具有竞争力。我们表明,对合成对应的培训提供了各种相机视图的描述符的一致性。我们将训练与来自多种视图的几何对应关系进行比较,并提供消融研究。我们还使用从固定式摄像机中学到的描述符显示了一个机器人箱进行挑选实验,以定义掌握偏好。
translated by 谷歌翻译